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Abstract 

In this paper, we present the computational task- management tool Ganga, which 
allows for the specification, submission, bookkeeping and post-processing of compu- 
tational tasks on a wide set of distributed resources. Ganga has been developed to 
solve a problem increasingly common in scientific projects, which is that researchers 
must regularly switch between different processing systems, each with its own com- 
mand set, to complete their computational tasks. Ganga provides a homogeneous 
environment for processing data on heterogeneous resources. We give examples from 
High Energy Physics, demonstrating how an analysis can be developed on a local 
system and then transparently moved to a Grid system for processing of all available 
data. Ganga has an API that can be used via an interactive interface, in scripts, or 
through a GUI. Specific knowledge about types of tasks or computational resources 
is provided at run-time through a plugin system, making new developments easy to 
integrate. We give an overview of the Ganga architecture, give examples of current 
use, and demonstrate how Ganga can be used in many different areas of science. 
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1 1 Introduction 

2 Scientific communities are using a growing number of distributed systems, 

3 from local batch systems and community-specific services to generic, global 

4 Grid infrastructures. Users may debug applications using a desktop computer, 

5 then perform small-scale application testing using local resources and finally 
e run at full-scale using globally distributed Grids. Sometimes new resources 

7 are made available to the users through systems previously unknown to them, 

8 and signficant effort may be required to gain familiarity with these systems 

9 interfaces and idiosyncracies . The time cost of mastering application config- 

10 uration, tracking of computational tasks, archival and access to the results is 

11 prohibitive for the end-users if they are not supported by appropriate tools. 

12 Ganga is an easy-to-use frontend for the configuration, execution, and man- 

13 agement of computational tasks. The implementation uses an object-oriented 

14 design in Python [1]. It started as a project to serve as a Grid user inter- 

15 face for data analysis within the ATLAS [3] and LHC6 [4] experiments in 

16 High Energy Physics where large communities of physicists need access to 

17 Grid resources for data mining and simulation tasks. A list of projects which 
is supported the development of Ganga may be found in section 10. 

19 Ganga provides a simple but flexible programming interface that can be used 

20 either interactively at the Python prompt, through a Graphical User Inter- 

21 face (GUI) or programmatically in scripts. The concept of a job component is 

22 essential as it contains the full description of a computational task, including: 

23 the code to execute; input data for processing; data produced by the applica- 

24 tion; the specification of the required processing environment; post-processing 

25 tasks; and metadata for bookkeeping. The purpose of Ganga can then be 

26 seen as making it easy for a user to create, submit and monitor the progress 

27 of jobs. Ganga keeps track of all jobs and their status through a repository 

28 that archives all information between independent Gang A sessions. It is pos- 

29 sible to switch between executing a job on a local computer and executing on 

30 the Grid by changing a single parameter of a job object. This simplifies the 
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31 progression from rapid prototyping on a local computer, to small-scale tests 

32 on a local batch system, to the analysis of a large dataset using Grid resources. 

33 In Gang A, the user has programmatic access through an Application Pro- 

34 gramming Interface (API), and has access to applications locally for quick 

35 turnaround during development. 

36 Gang A is a user- and application-oriented layer above existing job submission 

37 and management technologies, such as Globus [5], Condor [6], Unicore [7] or 

38 gLite [8]. Rather than replacing the existing technologies, Ganga allows them 

39 to be used interchangeably, using a common interface as the interoperability 

40 layer. 

41 It is possible to make Ganga available to a user community with a high 

42 level of customisation. For example, an expert within a field can implement a 

43 custom application class describing the specific computational task. The class 

44 will encapsulate all low-level setup of the application, which is always the 

45 same, and only expose a few parameters for configuration of a particular task. 

46 The plugin system provided in Ganga means that this expert customisation 

47 will be integrated seamlessly with the core of Ganga at runtime, and can be 

48 used by an end-user to process tasks in a way that requires little knowledge 

49 about the interfaces of Grid or batch systems. Issues such as differences in data 

50 access between jobs executing locally and on the Grid are similarly hidden. 

51 Ganga may be used as a job management system integrated into a larger 

52 system. In this case Ganga acts as a library for job submission and control. 

53 In particular, Ganga may be used as a building block for the implementation 

54 of Grid Portals which allow users access to Grid functionality through their 

55 web browsers in a simplified way. These portals are normally domain specific 

56 and allow users of a distributed application to run it on the Grid without 

57 needing to know much about Grid tools. 

58 Ganga is licensed under the GNU General Public License 2 and is available for 

59 download from the project website: http://www.cern.ch/ganga. The instal- 

60 lation of Ganga is trivial and does not require privileged access or any server 

61 configuration. The Ganga installer script provides a self-contained package 

62 and most of the external dependencies are resolved automatically. However, 

63 Ganga generally does not attempt to install Grid or batch submission tools 

64 or the application software 3 . Typically such software is installed and managed 

65 separately by system administrators. Simple configuration files allow customi- 

2 Ganga is licensed under GPL version 2 or, if preferred by the user, any later 
version. Details of the GPL are available at http://www.gnu.org/licenses/gpl. 
html. 

3 Some external dependencies, such as NorduGrid submission tools, are automati- 
cally installed. 
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66 sation and configuration of Ganga at the level of site, workgroup and user. 

67 Between January 2007 and December 2008 Gang A was used at 150 sites 

68 around the world, with 2000 unique users running about 250k Gang A ses- 

69 sions 4 . 

70 In this paper, we describe in section 2 the overall functionality, in section 3 

71 details of the implementation, and in section 4 how the progress of jobs is 

72 monitored. Section 5 gives an overview of the Graphical User Interface. In 

73 sections 6 and 7 we discuss how Ganga is customised for specific user commu- 

74 nities. Interfacing and embedding Gang A in other frameworks is presented in 

75 section 8. In appendix A we provide some examples of how the API in Gang A 

76 can be used. 



77 2 Functionality 

78 Ganga is a user-centric tool that allows easy interaction with heterogeneous 

79 computational environments, configuration of the applications and coherent 
so organisation of jobs. Ganga functionality may be accessed by a user through 
si any of several interfaces: a text-based command line in Python, a file-based 

82 scripting interface and a graphical user interface (GUI). This reflects the differ- 

83 ent working styles in different user communities, and addresses various usage 

84 scenarios such as using the GUI for training new users, the command line to 

85 exploit advanced use-cases, and scripting for automation of repetitive tasks, 
se For Ganga sessions the current usage fractions are 55%, 40% and 5% respec- 
87 tively for interactive prompt, scripts and GUI. As shown in Fig. 1, the three 
ss user interfaces are built on top of the Ganga Public Interface (GPI) which 

89 in turn provides access to the Ganga core implementation. 

90 A job in Ganga is constructed from a set of components. All jobs are required 

91 to have an application component and a backend component, which define 

92 respectively the software to be run and the processing system to be used. 

93 Many jobs also have input and output dataset components, specifying data 

94 to be read and produced. Finally, computationally intensive jobs may have a 

95 splitter component, which provides a mechanism for dividing into independent 

96 subjobs, and a merger component, which allows for the aggregation of sub job 

97 outputs. The overall component structure of a job is illustrated in Fig. 2. 

98 By default, the GPI exposes a simplified, top-level view suitable for most 

99 users in their everyday work, but at the same time allows for the details of 
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Fig. 1. The overall architecture of Ganga. The user interacts with the Ganga 
Public Interface (GPI) via the Graphical User Interface (GUI), the Command-Line 
Interface in Python (CLIP), or scripts. Plugins are provided for different application 
types and backends. All jobs are stored in the repository. 

100 underlying systems to be exposed if needed. An example interactive Ganga 

101 session using the GPI is given in Appendix A. 



102 Ganga prevents modification by the user of a submitted job. However, a copy 

103 of the job may easily be created and the copy can be modified. Ganga mon- 

104 itors the evolution of submitted jobs and categorises them into the simplified 

105 states submitted, running, completed, failed or killed. 

106 All job objects are stored in a job repository database, and the input and 

107 output files associated with the jobs are stored in a file workspace. Both the 

108 repository and the workspace may be in a local filesystem or on a remote 

109 server. 



no A large computational task may be split into a number of sub jobs automati- 

iii cally according to user-defined criteria and the output merged at a later stage. 

n2 Each sub job will execute on its own and the merging of the output will take 

in place when all have finalised. The submission of sub jobs is automatically op- 

H4 timised if the backend component supports bulk job submission. For exam- 

H5 pie, when submitting to the gLite workload management system [8] the job 

lie collection mechanism is used transparently to the user. Job splitting func- 

H7 tionality provides a flat list of sub jobs suitable for parallel processing of fully 

us independent workloads. However, certain backends allow users to make use of 
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Fig. 2. A set of components in Ganga can be combined to form a complete job. 
The application to run and the backend where it will run are mandatory while all 
other components are optional. 

more-sophisticated parallelisation schemes, for example the Message Passing 
Interface (MPI) [8]. In this case, Ganga may be used to manage collections of 
subjobs corresponding to MPI processes. 

The GPI allows frequently used job configurations to be stored as templates, 
so that they may easily be reused, and allows jobs to be labelled and organised 
in a hierarchical jobtree. 

Ganga has built-in support for handling user credentials, including classic 
Grid proxies, proxies with extensions for a Virtual Organisation Management 
Service (VOMS) [12], and Kerberos [13] tokens for access to an Andrew filesys- 
tem (AFS) [14]. A user may renew and destroy the credentials directly using 
the GPI. Ganga gives an early warning to a user if the credentials are about 
to expire. The minimum credential validity and other aspects of the credential 
management are fully configurable. 

Ganga supports multiple security models. For local and batch backends, the 
authentication and authorisation of the users is based on the local security 
infrastructure including user name and network authentication protocols such 
as Kerberos. Grid security infrastructure (GSI) [15] provides for security across 
organizational boundaries for the Grid backends. Different security models are 
encapsulated in pluggable components, which may be simultaneously used in 
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138 the same Ganga session. 



139 A Robot has been implemented for repetitive use-cases. It is a GPI script that 

140 periodically executes a series of actions in the context of a Ganga session. 

141 These actions are defined by implementations of an action interface. Without 

142 programming, the driver can be configured using existing action implementa- 

143 tions to submit saved jobs, wait for the jobs to complete, extract data about 

144 the jobs to an XML file, generate plain text or HTML summary reports, and 
us email the reports to interested parties. Custom actions can easily be added by 

146 either extending or aggregating the existing implementations or implement- 

147 ing the action interface directly, allowing for a diverse variety of repetitive 
us use-cases. An example is given in section 6.1. 

149 Details of the different kinds of Ganga component are given below, along 

150 with generic examples. More specialised components, designed for a particular 

151 problem domain, are considered in sections 6 and 7. 

152 2.1 Application components 

153 The application component describes the type of computational task to be 

154 performed. It allows the characteristics and settings of some piece of software 

155 to be defined, and provides methods specifying actions to be taken before and 

156 after a job is processed. The pre-processing (configuration) step typically in- 

157 volves examination of the application properties, and may derive secondary 

158 information. For example, intermediate configuration files for the application 

159 may be created automatically. The post-processing step can be useful for val- 

160 idation tasks such as determining the validity of the application output. 

161 The simplest application component (Executable) has three properties: 

162 exe : the path to an executable binary or script; 

163 args : a list of arguments to be passed to the executable; 

164 env : a dictionary of environment variables and the values they should be 

165 assigned before the executable is run. 

166 The configuration method carries out integrity checks - for example ensuring 

167 that a value has been assigned to the exe property. 

168 2.2 Backend components 

169 A backend component contains parameters describing the behaviour of a pro- 
no cessing system. The list of parameters can vary significantly from one system 
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171 to another, but can include, for example, a queue name, a list of requested 

172 sites, the minimum memory needed and the processing time required. In ad- 

173 dition, some parameters hold information that the system reports back to the 

174 user, for example the system-specific job identifier and status, and the machine 

175 where a job executed. 

176 A backend component provides methods for submitting jobs, and for can- 

177 celling jobs after submission, when this is needed. It also provides methods for 

178 updating information on job status, for retrieving output of completed jobs 

179 and for examining files produced while a job is running. 

180 Backend components have been implemented for a range of widely used pro- 

181 cessing systems, including: local host, batch systems (Portable Batch System 

182 (PBS) [16], Load Sharing Facility (LSF) [17], Sun Grid Engine (SGE) [18], 

183 and Condor [19]), and Grid systems, for example based on gLite [8], ARC [20] 

184 and OSG [21]. Remote backend component allows jobs to be launched directly 

185 on remote machines using ssh. 

186 As an example, the batch backend component defines a single property that 

187 may be set by the user: 

188 queue : name of queue to which job should be submitted, the system 

189 default queue being used if this left unspecified, 

190 and defines three properties for storing system information: 

191 id : job identifier; 

192 status : status as reported by batch system; 

193 actualqueue: name of queue to which job has been submitted. 

194 In addition, a remote-backend component allows a job defined in a Ganga 

195 session running on one machine to be submitted to a processing system known 

196 to a remote machine to which the user has access. For example, a user who 

197 has accounts on two clusters may submit jobs to the batch system of each 

198 from a single machine. 

199 2.3 Dataset components 

200 Dataset components generally define properties that uniquely identify a partic- 

201 ular collection of data, and provide methods for obtaining information about 

202 it, for example its location and size. The details of how data collections are 

203 described can vary significantly from one problem domain to another, and the 

204 only generic dataset component in Gang A represents a null (empty) dataset. 

205 Other dataset components are specialised for use with a particular application, 
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206 and so are discussed later. 

207 A strict distinction is made between the datasets and the sandbox (job) files. 

208 The former are the files or databases which are stored externally. The sandbox 

209 consists of files which are transferred from the user's filesystem together with 

210 the job. The sandbox mechanism is designed to handle small files (typically 

211 up to 10MB) while the datasets may be arbitrarily large. 



212 2.4 Splitter components 

213 Splitter components allow the user to specify the number of sub jobs to be 

214 created, and the way in which sub jobs differ from one another. As an example, 

215 one splitter component (ArgSplitter) deals with executing the same task 

216 many times over, but changing the arguments of the application executable 

217 each time. It defines a single property: 

218 args: list of sets of arguments to be passed to an application. 

219 Specialised splitters deal with creating sub jobs that process different parts of 

220 a dataset. 



221 2.5 Merger components 

222 Merger components deal with combining the output of subjobs. Typical out- 

223 put includes files containing data in a particular format, for example text 

224 strings or data representing histograms. As examples, one merger component 

225 (TextMerger) concatenates the files of standard output and error returned 

226 by a set of subjobs, and another (RootMerger) sums histograms produced 

227 in ROOT format [22]. Merging may be automatically performed in the back- 

228 ground when GANG A retrieves the job output or it may be controlled manually 

229 by the user. 



230 3 Implementation 

231 In this section we provide details of the actual implementation of some of the 

232 most important parts of Ganga. 
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Fig. 3. A component class implements one of the abstract interfaces corresponding 
to the different parts of a job. 

233 3. 1 Components 



234 Job components are implemented as plugin classes, imported by Gang A at 

235 start-up if enabled in a user configuration file. This means that users only see 

236 the components relevant to their specific area of work. Plugins developed and 

237 maintained by the Ganga team are included in the main Ganga distribution 

238 and are upgraded automatically when a user installs a newer Ganga version. 

239 Currently, the list includes around 15 generic plugins and around 20 plugins 

240 specific to ATLAS and LHC6. Plugins specific to other user communities need 

241 to be installed separately but could easily be integrated into the main Ganga 

242 distribution. 

243 Plugin development is simplified by having a set of internal interfaces and a 

244 mechanism for generating proxy classes [23]. Component classes inherit from 

245 an interface class, as seen in Fig. 3. Each plugin class defines a schema, which 

246 describes the plugin attributes, specifying type (read-only, read- write, inter- 

247 nal), visibility, associated user-convenience filters and syntax shortcuts. 

248 The user does not interact with the plugin class directly but rather with an 

249 automatically generated proxy class, visible in the GPL The proxy class only 
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250 includes attributes defined as visible in the schema and methods selected for 

251 export in the plugin class. This separation of the plugin and proxy levels is very 

252 flexible. At the GPI level, the plugin implementation details are not visible; 

253 all proxy classes follow the same design logic (for example, copy-by- value) ; 

254 persistence is automatic, session-level locking is transparent. In this way the 

255 low-level, internal API is separated from the user-level GPI. 

256 The framework does not force developers to support all combinations of appli- 

257 cations and backends, but only the ones that are meaningful or interesting. To 

258 manage this, the concept of a submission handler is introduced. The submis- 

259 sion handler is a connector between the application and backend components. 

260 At submission time, it translates the internal representation of the application 

261 into a representation accepted by a specific backend. This strategy allows in- 

262 tegration of inherently different backends and applications without forcing a 

263 lowest-common-denominator interface. 

264 Most of the plugins interact with the underlying backends using shell com- 

265 mands. This down-to-earth approach is particularly useful for encapsulating 

266 the environments of different subsystems and avoiding environment clashes. 

267 In verbose mode, Ganga prints each command executed so that a user may 

268 reproduce the commands externally if needed. Higher-level abstractions such 

269 as JSDL [24], OGSA-BES [25] or SAGA API [26] are not currently used, but 

270 specific backends that support these standards could readily be added. 



271 3.2 Job persistence 

272 The job repository provides job persistence in a simple database, so that any 

273 subsequent Ganga session has access to all previously defined jobs. Once a job 

274 is defined in a Ganga session it is automatically saved in the database. The 

275 repository provides a bookkeeping system that can be used to select particular 

276 jobs according to job metadata. The metadata includes such parameters as 

277 job name, type of application, type of submission backend, and job status. It 

278 can readily be extended as required. 

279 Ganga supports both a local and a remote repository. In the case of the 

280 former, the database is stored in the local file system, providing a standalone 

281 solution. In the case of the latter, the client accesses an AMGA [28] metadata 

282 server. The remote server supports secure connections with user authentication 

283 and authorisation based on Grid certificates. Performance tests of both the 

284 local and remote repositories show good scalability for up to 10 thousand 

285 jobs per user, with the average time of individual job creation being about 

286 0.2 seconds. There is scope for further optimisation in this area by taking 

287 advantage of bulk operations and job loading on demand. 
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288 The job repository also includes a mechanism to support schema migration, 

289 allowing for evolution in the schema of plugin components. 



290 3.3 Input and output files 

291 Ganga stores job input and output files in a job workspace. The current 

292 implementation uses the local file system, and has a simple interface that 

293 allows transparent access to job files within the Ganga framework. These 

294 files are stored for each job in a separate directory, with sub-directories for 

295 input and output and for each subjob. 

296 Users may access the job files directly in the file-system or using Ganga 

297 commands such as job.peekO. Internally, Ganga handles the input and 

298 output files using a simple abstraction layer which allows for trivial integra- 

299 tion of additional workspace implementations. Tests with a prototype using a 

300 WebDav [30] server have shown that all workspace data related to a job can 

301 be accessed from different locations. In this case, a workspace cache remains 

302 available on the local file system. 

303 The combination of a remote workspace and a remote job repository effectively 

304 creates a roaming profile, where the same Ganga session can be accessed at 

305 multiple locations, similar to the situation for accessing e-mail messages on an 

306 IMAP [31] server. 



307 4 Monitoring 

308 Ganga provides two types of monitoring: the internal monitoring updates 

309 the user with information on the progress of jobs, and the external monitoring 

310 deals with information from third-party services. 



3n l±.t Internal monitoring 

312 Ganga automatically keeps track of changes in job status, using a monitoring 

313 procedure designed to cope with varying backend response times and load 

314 capabilities. As seen in Fig. 4, each backend is polled in a different thread 

315 taken from a pool, and there is an efficient mechanism to avoid deadlocks 

316 from backends that respond slowly. The poll rate may be set separately for 

317 each backend. 
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Fig. 4. The internal monitoring updates the status of jobs using a pool of threads 
running in the Ganga core. Additional monitoring thread runs in a job wrapper 
and sends the monitoring information to external services. 

The monitoring sub-system also keeps track of the remaining validity of au- 
thentication credentials, such as Grid proxies and Kerberos tokens. The user 
is notified that renewal is required, and if no action is taken then Ganga is 
placed in a state where operations requiring valid credentials are disabled. 



322 4-2 External Monitoring 



323 Ganga 's external monitoring provides a mechanism for dynamically adding 

324 third-party monitoring sensors, to allow reporting of different metrics for run- 

325 ning jobs. 

326 The monitoring sensors can be inserted both on the client side - where Ganga 

327 runs - and on the remote environment (worker node) where the application 

328 runs, allowing the user to follow the entire execution flow. Monitoring events 

329 are generated at job submission time, at startup, periodically during execution, 

330 and at completion. 

331 Individual application and backend components in Ganga can be configured 

332 to use different monitoring sensors, allowing collection of both generic execu- 

333 tion information and application-specific data. 
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334 Use is currently made of two implementations of external monitoring sensors. 

335 One is the ATLAS Dashboard application monitoring [32] . Another is a custom 

336 service that allows the Gang A user to examine job output in real-time on the 

337 Grid. This streaming service is not enabled by default, but must be set up 

338 for each user community separately, and may then be requested by a user for 

339 specific jobs. 



340 5 Graphical User Interface 



The Ganga Graphical User Interface (GUI), shown in Fig. 5 and built us- 
ing PyQt3 [33], makes available all of the job-management functionality pro- 
vided at the level of the Ganga Public Interface. The GUI incorporates vari- 
ous convenience features, and its multi-threaded nature results in a degree of 
parallelism not possible at the command line: job monitoring and most job- 
management actions run concurrently, ensuring a good response time for the 
user. 
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Fig. 5. Ganga graphical user interface (GUI). The overview of jobs can be seen to 
the left, and the details of an individual job are to the right. 

348 The job monitoring window takes centre stage, with job status and other 

349 monitored attributes displayed in table format. Other features include sub- 

350 job monitoring, subjob folding/hiding, a job-details display drawer, a logical- 
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351 collections drawer, and a text-based job-search facility. Many characteristics of 

352 the monitoring window can be customised, allowing, for example, selection of 

353 the job attributes to be monitored, and of the colours used to denote different 

354 job states. 

355 The construction of a job, entailing selection of the required plugins and the 

356 entry of attribute values, is achieved from a job-builder window. This displays 

357 a foldable tree of job attributes, and associated data-entry widgets. The tree 

358 and widgets are generated dynamically based on plugin schemas, ensuring 

359 that the GUI automatically supports user-defined plugins without any change 

360 being needed to the GUI code. To assist with data entry, drop-down menus list 

361 allowed values, wherever these are defined; and tool tips provide explanations 

362 of individual job attributes. The job-builder window also features tool buttons 

363 for performing a wide range of job-related actions, including creation, saving, 

364 copying, submission, termination and removal. Finally, a multifunction Extras 

365 tool button provides access to arbitrary additional functionality implemented 

366 in the plugins. 

367 The GUI also has a scriptor window, providing a favourite-scripts collection, 

368 a job-script editor and an embedded Python session. The favourite-scripts 

369 collection allows frequently used Ganga scripts to be created, imported, ex- 

370 ported and cloned; the job-script editor facilitates quick modification and ex- 

371 ecution of scripts; and the embedded Python session allows interactive use 

372 of Ganga commands. 

373 Finally, a scrollable log window collects and displays all messages generated 

374 by Ganga. 



375 6 Use in experiments at the Large Hadron Collider 

376 The ATLAS and LHC6 experiments aim to make discoveries about the fun- 

377 damental nature of the Universe by detecting new particles at high energies, 

378 and by performing high-precision measurements of particle decays. The ex- 

379 periments are located at the Large Hadron Collider (LHC) at the European 

380 Laboratory for Particle Physics (CERN), Geneva, with first particle colli- 

381 sions (events) expected in 2009. Both experiments require processing of data 

382 volumes of the order of petabytes per year, rely on computing resources dis- 

383 tributed across multiple locations, and exploit several Grid implementations. 

384 The data-processing applications, including simulation, reconstruction and fi- 

385 nal analysis for the experiments, are based on the C++ Gaudi/Athena [34] 

386 framework. This provides core services, such as message logging, data access, 

387 histogramming, and a run-time configuration system. 
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388 The data from the experiments will be distributed at computing facilities 

389 around the world. Users performing data analysis need an on-demand access 

390 mechanism to allow rapid pre-filtering of data based on certain selection cri- 

391 teria so as to identify data of specific interest. 

392 The role of Gang A within ATLAS and LHC6 is to act as the interface for 

393 data analysis by a large number of individual physicists. Ganga also allows 

394 for the easy exchange of jobs between users, something that can otherwise be 

395 difficult because of the complex configuration of analysis jobs. 



396 6.1 The LHCb experiment 

397 The LHC6 experiment is dedicated to studying the properties of B mesons 

398 (particles containing the b quark) and in this section we describe the way in 

399 which Ganga interacts with the application and backend plugins specific to 

400 LHC6. 

401 In a typical analysis, users supply their own shared libraries, containing user- 

402 written classes, and these are loaded at run-time. The LHC6 applications are 

403 driven by a configuration file, which includes definitions of the libraries to 

404 load, non-default values for object parameters, the input data to be read, and 

405 the output to be created. 

406 Ganga includes an application component for GAUDl-based applications to 

407 simplify the task of performing an analysis. During the configuration stage, 

408 and before job submission, the application component undertakes the following 

409 tasks: 

410 • it locally sets up the environment for the chosen application; 

411 • it determines the user-owned shared libraries required to run the job; 

412 • it parses the configuration file supplied, including all its dependencies; 

413 • it uses information obtained from the configuration file to determine the 

414 input data required and the outputs expected; 

415 • it registers the inputs and outputs with the submission backend. 

416 The user, then, only needs to specify the name and version of the application 

417 to run, and the configuration file to be used. 

418 Code under development by a user may contain bugs that cause runtime errors 

419 during job execution. The transparent switching between processing systems 

420 when using Ganga means that debugging can be performed locally, with 

421 quick response time, before launching a large-scale analysis on the Grid, where 

422 response times tend to be longer. 
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423 Some studies in LHC6, rather than being based on Gaudi, are performed using 

424 the RooFlT [37] framework, most notably studies that make use of simplified 

425 event simulations. Jobs for these studies require large amounts of processing 

426 power, but do not require input data and produce only small amounts of 

427 output. This makes them very easy to deploy on the Grid, with support in 

428 Gang A provided by a generic ROOT [22] application component. 

429 In the LHC6 computing model [29] , Grid jobs are routed through the DIRAC [35] 

430 workload management system (WMS). DIRAC is a pilot-based system where 

431 user jobs are queued in the WMS server and the server submits generic pi- 

432 lot scripts to the Grid. Each pilot queries the WMS for a job with resource 

433 requirements satisfied by the machine where the pilot script is running. If a 

434 compatible job is available, it is pulled from the WMS and started. Otherwise, 

435 the pilot terminates and the WMS sends a new pilot to the Grid. This system 

436 improves the reliability of the Grid system as seen by the user. Ganga pro- 

437 vides a DIRAC backend component that supports submission of jobs to the 

438 DIRAC WMS, making use internally of DIRAC's Python API [36]. 

439 A splitter component implemented specifically for LHC6 is able to divide the 

440 analysis of a large dataset into many smaller subjobs. During the splitting, a 

441 file catalogue is queried to ensure that all data associated with an individual 

442 sub job is available in its entirety at a minimum of one location on the Grid. 

443 This gives significant optimisation, as it avoids subjobs having to copy data 

444 across the network before an analysis can start. 

445 In total, above 300k user jobs finished successfully in 2008 with a total CPU 

446 consumption of 87 CPU years. The jobs ran at a total of 140 Grid sites across 

447 the globe. The system was responsive to a highly irregular usage pattern and 

448 spikes of several thousand simultaneous jobs were observed during the year. 

449 This usage is expected to rise dramatically after the start of the LHC6 data 

450 taking. 

451 The Robot in Ganga is used within LHC6 for end-to-end testing of the dis- 

452 tributed analysis model. It submits a representative set of analysis jobs on 

453 a daily basis, monitors their progress, and checks the results produced. The 

454 overall success rate and the time to obtain the results is recorded and pub- 

455 lished on the web. The Robot monitors this information, producing statistics 

456 on the long-term system performance. 



457 6.2 The ATLAS experiment 

458 ATLAS is a general-purpose experiment, designed to allow observation of new 

459 phenomena in high-energy proton-proton collisions. 
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460 The distributed analysis model is part of the ATLAS computing model [38] 

461 which requires that data are distributed at various computing sites, and user 

462 jobs are sent to the data. 

463 An ATLAS analysis job typically consists of a Python or shell script that 

464 configures and runs user algorithms in the Athena framework [38] , reads and 

465 writes event files, and fills histograms/n-tuples. More-interactive analysis may 

466 be performed on large datasets stored as n-tuples. 

467 There are several scenarios relevant for a user analysis. Some analyses require 

468 a fast response time and a high level of user interaction, for which the parallel 

469 ROOT facility PROOF [41] is well suited. Other analyses require a low level 

470 of user interaction, with long response times acceptable, and in these cases 

471 Ganga and Grid processing are ideal. 

472 Analysis jobs can produce large amounts of data, which may initially be stored 

473 at a single Grid site, and may subsequently need to be transferred to other 

474 machines. This is supported in ATLAS by the Distributed Data Management 

475 system DQ2 [39]. This provides a set of services for moving data between 

476 Grid-enabled computing facilities, and maintains a series of databases that 

477 track the data movements. The vast amounts of data involved are grouped 

478 into datasets, based on various criteria, for example physics characteristics, to 

479 make queries and retrievals more efficient. 

480 6.2.1 ATLAS Grid infrastructures 

481 The ATLAS experiment employs three Grid infrastructures for user analysis 

482 and for collaboration-wide event simulation and reconstruction. These are the 

483 Grid developed in the context of Enabling Grids for e-Science (EGEE, mainly 

484 Europe) [42], accessed using gLite middleware [8], the Open Science Grid 

485 (OSG, mainly North America) [21], accessed using the PanDA system [40], 

486 and NorduGrid (mainly Nordic countries) [43], accessed using the ARC mid- 
487 dleware [20]. Ganga seamlessly submits jobs to all three Grid flavours. 

488 6.2.2 ATLAS user analysis 

489 A typical ATLAS user analysis consists of an event-selection algorithm devel- 

490 oped in the Athena framework. Large amounts of data are filtered to identify 

491 events that meet certain selection criteria. The events of interest are stored in 

492 files grouped together as datasets in the DQ2 system. The Ganga components 

493 for Athena jobs include the following functionality: 

494 • During job submission, DQ2 is queried for the file content and location 

495 of the dataset to be analysed. The number of possible Grid sites is then 
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496 restricted to the dataset locations. 

497 • A job can be divided into several subjobs, each processing a given number 

498 of files from the full dataset. 

499 • In a Grid job, after the Athena application has completed, the user output 

500 is stored on the storage element of the site where the job was run, and is 

501 registered in DQ2. 

502 In the second half of 2008, more than 4 x 10 5 Grid jobs were submitted through 

503 Ganga by ATLAS users. Following a procedure similar to that of LHC6, the 

504 Ganga Robot submits test jobs daily to ATLAS Grid sites. Test results are 

505 used to guide users to sites that are performing well, avoiding job failures on 

506 temporarily misconfigured sites. 

507 6.2.3 ATLAS small-scale event simulations 

508 In addition to data analysis, users sometimes need to simulate event samples 

509 of the order of a few tens of thousands of events. The AthenaMC applica- 

510 tion component has been developed to integrate software used in the offi- 

511 cial ATLAS system for event simulation. This component consists of a set of 

512 Python classes that together handle input parameters, input datasets and 

513 output datasets for the three production steps: event generation, detector sim- 

514 ulation, and event reconstruction. As in the case of user analysis, datasets are 

515 managed by the DQ2 system. 



516 7 Other usage areas 

517 Ganga offers a flexible and extensible interface that make it useful beyond 

518 the original scope of particle-physics applications in the ATLAS and LHC6 

519 experiments. Here we provide details of just a few of the other contexts in 

520 which Ganga has been used. 

521 7.1 Enabling industrial- scale image retrieval 

522 Imense Ltd 5 , a Cambridge-based startup company, has implemented a novel 

523 image retrieval-system (Fig. 6), featuring automated analysis and recognition 

524 of image content, and an ontological query language. The proprietary image 

525 analysis, developed from published research [44], includes recognition of visual 

526 properties, such as colour, texture and shape; recognition of materials, such as 

527 grass or sky; detection of objects, such as human faces, and determination of 

5 http://imense.com 
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their characteristics; and classification of scenes by content, for example beach, 
forest or sunset. The system uses semantic and linguistic relationships between 
terms to interpret user queries and retrieve relevant images on the basis of the 
analysis results. Moreover, the system is extensible, so that additional image 
classification modules or image context and metadata can easily be integrated 
into the index. 



Retrieval 
Requirements 



Ontological Query Language 



Cambridge 
^ Ontology Ltd. 



Relevance 
Assessment 




Semantic 
'descriptor 
extraction 



Object formation 



Copyright 2007 



Fig. 6. Schematic representation of the image-retrieval system developed by Imense 
Ltd. Image characteristics are determined by applying feature-extraction algo- 
rithms, and an ontological query language bridges the semantic gap between terms 
that might be employed in a user query and terms understood by the processing 
system. 



534 By using the GANG A framework for job submission and management, it has 

535 been possible to port and deploy a large part of Imense's image- analysis tech- 

536 nology to the Grid and build a searchable index for more than twenty- million 

537 high-resolution photographic images. 



538 The processing stages for the image-search system - image analysis and in- 

539 dexing - are intrinsically sequential. Analysis has been parallelised at the level 

540 of single images or small subsets of images. Each image can therefore be pro- 

541 cessed in isolation on the Grid, with this processing usually taking a few to 

542 ten seconds. In order to minimise overheads, images are grouped in sets of a 

543 few hundred per job submitted through Ganga. Results of the image pro- 

544 cessing and analysis are passed back to the submission server once a job has 

545 successfully completed. 



546 Support for Imense has been added to Ganga through the implementation of 

547 two specialised components: an application component that deals with running 

548 the image-processing software, and a dataset component for taking care of the 
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549 output. As usual with Ganga, the jobs can run both locally and on the Grid, 

550 giving maximum flexibility. 

551 At runtime, images are retrieved and segmented one at a time, all of the images 

552 are classified, and finally an archive is created of the output files (several per 

553 input image). The archive is returned using the sandbox mechanism in Ganga 

554 when using the Local backend, and is uploaded to a storage element when 

555 using the Grid LCG backend. 

556 The specialised dataset component provides methods for downloading a re- 

557 suits archive from a storage element, and for unpacking an archive to a desti- 

558 nation directory. These methods are invoked automatically by Ganga when 

559 an image-processing job completes: the effect for the user is that a list of im- 

560 ages is submitted for processing and results are placed in the requested output 

561 location independently of the backend used. 

562 7.2 Smaller collaborations in High Energy Physics 

563 Large user communities, such as ATLAS and LHC6, profit from encapsulating 

564 shared use cases as specialised applications in Gang A. In contrast, individual 

565 researchers or developers in the context of rapid prototyping activities may 

566 opt to use generic application components. In such cases, Ganga still provides 

567 the benefits of bookkeeping and a programmatic interface for job submission. 

568 As an example of this way of working, a small community of experts in the 

569 design of gaseous detectors use Ganga to run the Garfield [45] simulation 

570 program on the Grid. A Ganga script has been written that generates a chain 

571 of simulation jobs using the Garfield generator of macro files and Ganga's 

572 Executable application component. The Garfield executables, and a few 

573 small input files, are placed in the input sandbox of each job. Histograms and 

574 text output are then returned in the output sandbox. This simple approach 

575 allowed integration of Garfield jobs in Ganga in just a few hours. 

576 7.3 Ganga integrated with lightweight Grid middleware 

577 The open-plugin architecture of Ganga allows easy integration of additional 

578 Grid middleware, as has been achieved, for example, with the ARC (Advanced 

579 Resource Connector) Grid middleware [20] . This is a product of the NorduGrid 

580 project [43], and is used by many academic institutions in the Nordic countries 

581 and elsewhere. 

582 ARC jobs are accepted and brokered by a Grid manager, running at site level, 

583 and resource lookup is done through load balancing and runtime environments 
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584 advertised by individual sites. File storage and access is 'cloudy', meaning that 

585 all files registered in Grid-wide catalogues are accessible to all worker nodes. 

586 File transfers are handled by the Grid manager, between job acceptance and 

587 execution. ARC-connected resources are used e.g. by researchers in bioinfor- 

588 matics, genomics, meteorology, in addition to high-energy physics. 

589 Ganga has been interfaced to ARC through a backend, which translates 

590 Ganga input into ARC readable xRSL language. The ARC user client is 

591 lightweight, and binaries are provided as an external library at Gang A in- 

592 stall time. The main user of this integration is the ATLAS experiment (see 

593 sec. 6.2), where it is the main user access portal to one of the experiment's 

594 three main computing grids. Further collaboration between ARC and Gang A 

595 is envisaged, to employ Gang A as a fully featured frontend to ARC. 



596 8 Interfacing to other frameworks 



597 The Ganga Public Interface constitutes an API for generic job submission 

598 and management. As a result, Gang A may be programmatically interfaced to 

599 other frameworks, and used as a convenient abstraction layer for job manage- 

600 ment. Ganga has been used in combination with DIANE [46], a lightweight 

601 agent-based scheduling layer on top of the Grid, in a number of scientific ac- 

602 tivities. These have included: dosimetry-related simulation studies in medical 

603 physics [47]; regression testing of the Geant 4 [48] detector-simulation toolkit; 

604 in-silico molecular docking in searches for new drugs against potential variants 

605 of an influenza virus [49]; telecommunication applications [50]; and theoreti- 

606 cal physics [51]. The DIANE worker agents are executed as Ganga jobs, so 

607 that resource usage may be controlled by the user from the Ganga interface. 

608 This approach allows the efficiency of the DIANE overlay scheduling system 

609 to be combined with the well-structured job management offered by Ganga, 

610 as well as combining Grid and non-Grid resources under a uniform interface, 
en Also, this allows the efficient implementation of low-latency access to Grid 

612 resources and improvements to responsiveness when supporting on-demand 

613 computing and interactivity [52]. 

614 Ganga may be embedded in web-based services such as the bio-informatics 

615 portal developed by ASGC, Taipei. The portal is fully customized for analysis 

616 of candidate drugs against avian flu. The portal engine delegates job man- 

617 agement to the embedded DIANE/Ganga framework, as shown in Fig. 7. 

618 Following this approach, users can switch between different resources, or access 

619 heterogeneous computing environments through a single same web interface. 
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Fig. 7. Gang cl clS a job management component embedded in DIANE, with an 
application portal. 

620 9 Conclusion 

621 Ganga has been presented as a tool for job management in an environment of 

622 heterogeneous resources and is particularly suited to the Grid paradigm that 

623 has emerged in large-scale distributed computing. Ganga makes it easy to 

624 define a computational task that can be executed locally for debugging, and 

625 subsequently be run on the Grid, for large scale data mining. We have shown 

626 how Ganga simplifies task specification, takes care of job submission, moni- 

627 toring and output retrieval, and provides an intuitive bookkeeping system. 

628 We have demonstrated the advantages of having a well-defined API, which can 

629 be used interactively at the Python prompt, through a GUI or programmat- 

630 ically in scripts. By virtue of its plugin system, Ganga is readily extended 

631 and customised to meet the requirements of new user communities. Examples 

632 of Ganga usage have been provided in particle physics, medical physics and 

633 image processing. 

634 Existing command-line submission interfaces, such as gLite, tend to include 

635 only limited usability features. Some higher level tools, for example Grid Way [53] , 

636 present jobs as if they were Unix processes and corresponding command line 

637 utilities. Interfaces based on Condor job-submission scripts have also been de- 

638 veloped [54] . A distinctive feature of G ANGA is that it may easily be adapted 
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639 to different styles of working, allowing simultaneous use of three different in- 

640 terfaces. Gang A also provides a higher level of abstraction than most job- 

641 management tools, and allows a user to focus on solving the domain-specific 

642 problems, rather than changing their way of working each time they switch to 

643 a new processing system. 

644 Gang A has a large user base and is in active development. Gang A is a tool 

645 which may easily be used to support new scientific or commercial projects on 

646 a wide range of distributed infrastructures. 
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664 A Examples 



665 Below we give a set of examples of working with Ganga. For ease of reading, 

666 Python keywords are in bold. First we look at a complete Ganga session. 

667 

668 ~ % ganga 

669 *** Welcome to Ganga *** 

670 Version: Ganga-5-1-0 

671 Documentation and support: http://cern.ch/ganga 

672 Type helpO or help (' index' ) for online help. 
673 

674 This is free software (GPL) , and you are welcome to redistribute 

675 it under certain conditions; type license () for details. 

676 j =Job ( name= ' MyJob ' ) # Create a default job 

677 [2]: j . submit () # Submit the job 
678 

679 # wait for the monitoring 
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681 [3] 

682 [4] 

683 [ 5 ] 

684 [61 



. peek ( 'stdout ') # Look at the output 

=j . copy ( name= 'GridJob') # Make a copy of the job 

. backend=LCG ( ) # Change backend to the Grid 

. submit () # Submit the job 

jobs # List jobs 



685 [ 
686 

687 ... job listing. . . 

688 [8]: Exit # Quit Ganga . 

689 In the next example, we create a job for analysis of LHC6 data. A splitter is 
6% used to divide the analysis between subjobs. Data are assigned using logical 

691 identifiers, and the DIRAC WMS ensures that subjobs are sent to locations 

692 where the required data are available. 

693 [ 1 ] : j =Job ( applicat ion=DaVinci () , backend=Dirac ()) 

694 [2]: j.inputdat a=LHCbDataset ( f i 1 e s = [ # D at a to read 

695 ... 'LFN: / foo . dst ' , 

696 ... 'LFN: / bar . dst ' , 

697 ... many more data files]) 

698 [ 3 ] : j.splitter = DiracSplitter() # We want subjobs 

699 [4] : j . submit () 
700 

701 Job submission output 

702 Here, we use the fact that standard Python commands are available at the 

703 Ganga prompt, and print information on subjobs. 

704 # Status of jobs and where they ran 

705 [5] : for subjob in j . subjobs : 

706 . . . print subjob . status , subjob . actualCE 
707 

708 42 

709 # Find backend identifier of all failed jobs 

710 [ 6 ] : for j in jobs, select (status=' failed'): 

711 . . . print j . backend . id 
712 

713 42 

714 Groups of jobs may be accessed and manipulated using simple methods: 



715 [ 1 ] 

716 [2] 

717 [3] 

718 [41 



jobs . select ( status^' failed ' ) . resubmit () 
jobs . select ( name= 'testjob '). kill () 
newjobs = j o b s . s e 1 e c t ( s t a t u s= ' new ' ) 
newjobs. select ( name= ' urgent ' ) . submit () 
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